EE2v2 differential methylation via tiling array

xml session: http://eagle.fish.washington.edu/bivalvia/array/2013.11.22.mgavery/array_MG_session.xml

also joined these DMRs with other gene info (e.g. SPID, description, length of gene, #exons etc) and am starting to annotate them a little more here: http://eagle.fish.washington.edu/bivalvia/array/2013.11.22.mgavery/DMRvisAnnotated_CGIstats_TJGRGeneSPID.csv

Examples:

DMRs in cortotropin releasing factor receptor - promoter and first intron

5-hydroxytryptamine receptor 1B - middle exon

ATP binding cassete family - middle intron

"B" Grade DMR - no A01 peak, but similar pattern across the gene

1st analysis

analysis from R. Basom (from the Hutch):

I've attached your results, which were generated via the following method, using the Bioconductor package, "Ringo":

* Read NimbleGen pair files into R

* Remove control probes

* Tukey biweight mean center log2 ratios within each array (all ratios were formulated such that the estrogen treated sample is always in the numerator)

* Smooth log2 ratio values by computing a running median, using a window size of 1000 bases

* Within each array, use the log2 ratio distribution to calculate the threshold to be used for peak detection, then detect peaks such that at least 3 adjacent probes must be above threshold.

* Identify the overlap between A02 and A03, excluding A01 using the Bioconductor package "ChIPpeakAnno." There are 96 regions from A02 that overlap with 99 regions from A03 - but not A01.

The results can be viewed in IGV. If you don't already have an IGV genome configured for the Pacific Oyster, there's an IGVgenome subdirectory that contains one, though the oyster fasta file will have to be placed in this same directory (there are instructions in the README). There's also a subdirectory of genomeBrowserTracks, that has wig files with the smoothed values from each array, as well as bed files with the peaks that were detected for A01, as well as files that have peaks that were detected in both A02 and A03, but not A01. I've loaded the genome browser tracks into IGV and have done some configuration to assist with visualization. If you'd like to use the configuration I've put together, you can load the session file that's attached in the results. There's a screen shot to provide an example of how I've configured the tracks in IGV. There's also an Excel file that has the 96 regions from A02 that overlap with A03, but not A01. If you already have IGV open, this file has a column of hyperlinks that can be clicked on to direct IGV to each overlap region.

folder: http://eagle.fish.washington.edu/bivalvia/index.php?dir=array%2F2013.09.27.mgavery%2F

Hypermethylated Regions:

Analysis with mRNA hits

91 regions overlapped w/ mRNA

genefish:bin Mackenzie$ ./intersectBed -a /Volumes/web-1/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/oyster_v9_glean_final_rename_mRNA.gff -b /Volumes/web-1/bivalvia/array/2013.09.27.mgavery/2013.09.27.A03overlapWithA02.bed > /Volumes/web-1/bivalvia/array/arrayRegions_mRNA.txt

genefish:bin Mackenzie$ wc /Volumes/web-1/bivalvia/array/arrayRegions_mRNA.txt

91 819 6013 /Volumes/web-1/bivalvia/array/arrayRegions_mRNA.txt

Pulled the 91 regions into galaxy and imported fasta of oyster genes (oyster_v9_gene.fasta)

Did a fasta to tabular, joined on the gene ID, cut the gene ID and sequences only to a new file and converted tabular to fasta.

fasta of genes with differential methylation can be found here: http://eagle.fish.washington.edu/bivalvia/array/genesWithDiffMeth.fasta

SAME ANALYSIS as above BUT W/ all gene hits (i.e. mRNA)

82 unique SPIDs went into DAVID (((here is link to all annotated regions (duplicates were removed before being put into DAVID) http://eagle.fish.washington.edu/bivalvia/array/arrayRegions_mRNA_annotated.txt )))

82 SPIDs put in, given 76 DAVID IDs

ALL OYSTER GENES ARE USED AS BACKGROUND ~~SEE BELOW FOR ONLY GENES ON THE ARRAY AS BACKGROUND

these processes include 9 SPIDs, which are annotated here: http://eagle.fish.washington.edu/bivalvia/array/genesID_DAVIDenrichment_mRNA.xlsx

snapshot

joined the SP_accessions to GO terms in SQLshare and put GO terms into ReviGO. Visualization below:

DAVID with only genes on array (http://eagle.fish.washington.edu/bivalvia/array%20design/GenesOnArray_SPIDs_only.csv) as background

Not many differences from above (Check significance after correction, none are below 0.05) Only addition is regulation of neurotransmitter levels. The EXACT SAME 9 SPIDs contributed to the enriched categories so the ReviGO plots above would be the same.

Hypomethylated Regions:

As it turns out the above 96 regions are hypermethylated in the EE2 treated sample, there are another 287 regions that are hypomethylated in the EE2 treated.

When joined to mRNA 268 regions overlapped with mRNA (these may not be unique regions of the 287 since mRNAs also overlap)

I identified the genes that overlapped with these hypomethylated regions and annotated them to their SPIDs. There were 201 unique SPIDs in this list. I used those as the gene list and all genes on the array as background for enrichment analysis in DAVID

Here is the output of enriched processes:

http://eagle.fish.washington.edu/bivalvia/array/2013.10.03.mgaveryHypomethlated/DAVIDenrichment_hypomethinEE2.txt

Here are the enriched gene IDs:

http://eagle.fish.washington.edu/bivalvia/array/2013.10.03.mgaveryHypomethlated/DAVIDenrichment_hypomethinEE2.txt

11/01/13

I am interested in seeing if the differentially methylated regions are primarily in exons or introns or if they cross an intron exon boundary:

Initial summary:

96 hypermethylated regions (compared to control): 90 overlapped with mRNA

287 hypomethylated regions : 256 overlapped with mRNA

~~it is not surprising that this may overlapped with genes since the tiling array was designed to cover genes (+2kb upstream)~~

How many regions overlapped with intron/exon boundaries??

First, designed a gff to span 10 bp upstream from an intron (aka 10 bp upstream of an exon intron boundary):

bedtools code: genefish:bin Mackenzie$ ./flankBed -i /Volumes/web-1/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff -g /Volumes/web-1/cnidarian/qDOD_scaffold_length.txt -l 10 -r 0 > /Volumes/web-1/bivalvia/array/flankIntron.txt

Then, performed an intersect bed to count the number of times a differentially methylated region intersected one of these boundaries

bedtools code for hypermethylated: genefish:bin Mackenzie$ ./intersectBed -c -a /Volumes/web-1/bivalvia/array/2013.09.27.mgavery/2013.09.27.A03overlapWithA02.bed -b /Volumes/web-1/bivalvia/array/flankIntron.gff > /Volumes/web-1/bivalvia/array/2013.09.27.mgavery/intersect_boundaries_hypermeth.txt

bedtools code for hypomethylated: genefish:bin Mackenzie$ ./intersectBed -c -a /Volumes/web-1/bivalvia/array/2013.10.03.mgaveryHypomethlated/2013.10.03.A03overlapA02_butNotA01_hypomethylated_bed.bed -b /Volumes/web-1/bivalvia/array/flankIntron.gff > /Volumes/web-1/bivalvia/array/2013.10.03.mgaveryHypomethlated/countofboundary_hypometh.txt

Summary:

Of the 90 hypermethylated regions overlapping with a gene 47 (or 52%) crossed an intron/exon boundary (a majority only crossed one boundary, but up to 6 boundaries were crossed by a single region)

Of the 256 hypomethylated regions overlapping with a gene 122 (or 48%) overlapped 1 or more boundaries (similar to hypermethylated regions, a majority only crossed one boundary, but up to 6 boundaries were crossed by a single region)

How many differentially methylation regions overlap ONLY with coding regions?

I pulled out the subset of regions that did overlap with mRNA, but did NOT overlap with an exon/intron boundary and performed an intersect bed to count how many regions overlapped with just an exon (and can also draw conclusions that those that do not overlap only overlap with an intron).

hypermethylated: there were 43 regions that overlapped with mRNA, but did NOT cross an intron/exon boundary. I did 2 intersectBed with these 43 regions - 1 with the exon.gff 1 with the intron.gff. What I found was the 32 overlapped with an intron only 6 overlapped with an exon only and an additional 5 came up as intersecting with both an intron and an exon (this probably has something to do with the gff I created to identify boundaries (only 10bp upstream from intron).

But! to summarize: 96 regions: 90 are in genes (52 cross exon/intron boundary another 32 are in introns only and 6 are just in exons), 6 are outside of gene

hypomethylated: there were 134 regions that overlapped with mRNA, but did NOT cross an intron/exon boundary. I did 2 intersectBed with these 134 regions - 1 with the exon.gff 1 with the intron.gff. What I found was the 114 overlapped with an intron only 4 overlapped with an exon only and an additional 16 came up as intersecting with both an intron and an exon (this probably has something to do with the gff I created to identify boundaries (only 10bp upstream from intron).

But! to summarize: 287 regions: 256 are in genes (138 cross exon/intron boundary another 114 are in introns only and 4 are just in exons), 31 are outside of gene

files I worked with:

intron and exon canonical files in trilobite

intron/exon boundary gff I generated (described above)

generated bed files with

How big are these regions?

min: 231 bp

max: 7,979 bp

average: 983 bp

Determine how to validate these?

-Is there a way to pick the 'best' regions to validate? (log2s?)

-How much would it cost to do pyrosequencing of these regions?

How many CG sites per regions?

Did some intersect beds

1. count the number of CG per DMR

genefish:bin Mackenzie$ ./intersectBed -c -a /Volumes/web-2/bivalvia/array/2013.09.27.mgavery/2013.09.27.A03overlapWithA02.bed -b /Volumes/web-2/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff >/Users/Mackenzie/Desktop/hyperarray_CG.txt

2. overlap DMR with mRNAs using the -wb option to get gene id for each DMR

genefish:bin Mackenzie$ ./intersectBed -wb -a /Volumes/web-2/bivalvia/array/2013.09.27.mgavery/2013.09.27.A03overlapWithA02.bed -b /Volumes/web-2/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff >/Users/Mackenzie/Desktop/hyperarray_mRNA.txt

3. Then joined these in SQL share to get the length of regions, number of CG and gene ID for these DMR. The file is saved here: http://eagle.fish.washington.edu/bivalvia/array/2013.09.27.mgavery/hyper_CG_mRNA_annotated.csv

Interestingly, 3 of the regions did not contain a CG. Why were they identified as being differentially methylated?